gh-113993: Allow interned strings to be mortal, and fix related issues#120520
Merged
encukou merged 79 commits intopython:mainfrom Jun 21, 2024
Merged
gh-113993: Allow interned strings to be mortal, and fix related issues#120520encukou merged 79 commits intopython:mainfrom
encukou merged 79 commits intopython:mainfrom
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I've spent too much time looking at this myself, it wants more eyes :)
I spent a week learning about the string interning mechanism, and wrote up how I think it should work in an
InternalDocsfile I'm adding here.I found a bunch of ... quirks if not outright bugs. For example, we have duplicate singletons (e.g.
_Py_ID(a)and the latin1 short stringa). I don't think I can bring back mortal interned strings without getting my idea of the design in sync with the code, so, this ended up being a big PR.Add an InternalDocs file describing how interning should work and how to use it.
(Please review this first!)
Add internal functions to explicitly request what kind of interning is done:
_PyUnicode_InternMortal_PyUnicode_InternImmortal_PyUnicode_InternStaticSwitch uses of
PyUnicode_InternInPlaceto those.Disallow using
_Py_SetImmortalon strings directly.You should use
_PyUnicode_InternImmortalinstead:interning a immortalizing copy.
_Py_SetImmortaldoesn't handle theSSTATE_INTERNED_MORTALtoSSTATE_INTERNED_IMMORTALupdate, and those flags can't be changed inbackports, as they are now part of public API and version-specific ABI.
Add private
_only_immortalargument forsys.XXX, used in refleak test machinery.Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
_Py_ID_Py_STR(including the empty string)Now, when you intern a singleton, that exact singleton will be interned.
Add a
_Py_LATIN1_CHRmacro, use it instead of_Py_ID/_Py_STRfor one-character latin-1 singletons everywhere (including Clinic).Intern
_Py_STRsingletons at startup.Try this in 3.12: (click to expand)
In 3.13 the reproducer doesn't work but I don't think the underlying unsoundness was fixed.
For free-threaded builds, intern
_Py_LATIN1_CHRsingletons at startup.Beef up the tests. Cover internal details (marked with
@cpython_only).Add lots of assertions